#loading in the necessary functions
pacman::p_load(tmap, sf, tidyverse, spdep, sfdep, DT, knitr)Take-home Exercise 1: Geospatial Analytics for Public Good
Introduction
Digitization of urban infrastructure has introduced new datasets from tracking of movement patterns, including space and time, through GPS and RFID. Understanding these datasets may potentially lead to a more informed decision making process in urban management and planning.
The purpose of this exercise is to perform Exploratory Spatial Data Analysis (ESDA), via appropriate Local Indicators of Spatial Association (GLISA) and Emerging Hot Spot Analysis (EHSA) to undercover the spatial and spatio-temporal mobility patterns of public bus passengers in Singapore.
The TASK
Geovisualisation and Analysis
With reference to the time intervals provided in the table below, compute the passenger trips generated by origin at the hexagon level,
Peak hour period Station entry time Weekday morning peak 6am to 9am Weekday afternoon peak 5pm to 8pm Weekend/holiday morning peak 11am to 2pm Weekend/holiday evening peak 4pm to 7pm Display the geographical distribution of the passenger trips by using appropriate geovisualisation methods,
Describe the spatial patterns revealed by the geovisualisation (not more than 200 words per visual).
Local Indicators of Spatial Association (LISA) Analysis
Compute LISA of the passengers trips generate by origin at hexagon level.
Display the LISA maps of the passengers trips generate by origin at hexagon level. The maps should only display the significant (i.e. p-value < 0.05)
With reference to the analysis results, draw statistical conclusions (not more than 200 words per visual).
Emerging Hot Spot Analysis(EHSA)
With reference to the passenger trips by origin at the hexagon level for the four time intervals given above:
Perform Mann-Kendall Test by using the spatio-temporal local Gi* values,
Prepared EHSA maps of the Gi* values of the passenger trips by origin at the hexagon level. The maps should only display the significant (i.e. p-value < 0.05).
With reference to the EHSA maps and data visualisation prepared, describe the spatial patterns reveled. (not more than 250 words per cluster).
I will be looking to Task 1: GeoVisualisation and Analysis and Task 2: Local Indicators of Spatial Association (LISA) Analysis.
Setting up
Geospatial Data Wrangling
Data Import, Extraction and Processing
Aspatial Data
Passenger Volume by Origin Destination Bus Stops from LTA DataMall
First download the raw data, and import into R environment using read_csv().
odbs <- read_csv("data/aspatial/origin_destination_bus_202308.csv")
#choosing Aug data as data setNext we will need to clean up the data sets, first by changing the data type to factor for easy sorting, filter and grouping:
odbs$ORIGIN_PT_CODE <- as.factor(odbs$ORIGIN_PT_CODE)
odbs$DESTINATION_PT_CODE <- as.factor(odbs$DESTINATION_PT_CODE)
#This changes the data type from chr to factor in the data fieldThe Data required are:
Weekday morning peak (6am to 9am)
Weekday afternoon peak (5pm-8pm)
Weekend/Holiday morning peak (11am to 2pm)
Weekend/Holiday evening peak (4pm to 7pm)
We will next filter our odbs to get the required data per above:
odbs_peak <- odbs %>%
filter((DAY_TYPE == "WEEKDAY" &
((TIME_PER_HOUR >=6 & TIME_PER_HOUR <=9) |
(TIME_PER_HOUR >=17 & TIME_PER_HOUR <=20))) |
DAY_TYPE == "WEEKENDS/HOLIDAY" &
((TIME_PER_HOUR >=11 & TIME_PER_HOUR <=14) |
(TIME_PER_HOUR >=16 & TIME_PER_HOUR <=19))) %>%
group_by(ORIGIN_PT_CODE) %>% #this allow me to extract all those trips generated
summarize(TRIPS = sum(TOTAL_TRIPS)) #derives new field to allow me to do the aggregation#just to take a look at the data table
datatable(odbs_peak)To preserve the data set, we can save the output in rds format for future use:
write_rds(odbs_peak, "data/rds/odbs_peak.rds")To import into R environment:
odbs_peak <- read_rds("data/rds/odbs_peak.rds")Geospatial Data
- Bus Stop Location from LTA DataMall
busstop <- st_read(dsn = "data/geospatial",
layer = "BusStop") %>%
st_transform(crs = 3414)Reading layer `BusStop' from data source
`C:\cftoh\ISSS624\Take-home_Ex\Take-home_Ex1\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
#the st_transform(crs = ?) function takes takes 2D ST_Geometry data as input and returns values converted into the spatial reference specified (new coordinate reference system) provided, in this case 3414 is the Projected coordinate system for SG- Hexagon, a hexagon layer of 250m (this distance is the perpendicular distance between the centre of the hexagon and its edges.) should be used to replace the relative coarse and irregular Master Plan 2019 Planning Sub-zone GIS data set of URA.
Creating Hexagon
busstop_hexagon_grid = st_make_grid(busstop, c(250, 250), what = "polygons", square = FALSE)
hexagon_grid_sf = st_sf(geometry = busstop_hexagon_grid) %>%
# add grid ID
mutate(grid_id = 1:length(lengths(busstop_hexagon_grid)))Data Cleaning and Transformation
We will next use st_intersection() for point and polygon overlay, to combine the data sets. This will provide us with output in point sf object.
hexagon <- st_intersection(hexagon_grid_sf, busstop) %>%
select(1,2,4) %>%
st_drop_geometry()write_rds(hexagon, "data/rds/hexagon.rds")Now we have got three data sets, namely
odbs_peak : The total number of trips from the Origin Bus Stop during the specified peak period.
busstop : The details of bus stop in Singapore, with the location.
hexagon: A hexagon layer of 250m with bus stop location.
Performing the Relational Join (to update attribute table of one geospatial with another aspatial data set)
Now we will next combine this onto our our odbs_peak data frame which shows the total number of trips from particular bus stop during peak hour.
obs_peak <- left_join(odbs_peak, hexagon,
by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
rename(ORIGIN_BS = ORIGIN_PT_CODE) %>%
group_by(grid_id) %>% #group by the grid_ID that has values
summarize(TOTAL_TRIPS = sum(TRIPS)) #derives new field to allow me to do the aggregation of total trips per grid IDsNext, as data sanity, we check if any duplicating records:
duplicate <- obs_peak %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()We noted there are duplicating records, so we will need to only retain the unique value using the below:
obs_peak <- unique(obs_peak)This removed duplicate records from the data frame, now we have got all unique records .
However, upon closer inspection, noted there are some duplicates on records:
datatable(obs_peak)Per the data table above, we can see that grid_id[4] and grid id[128] share the same TOTAL_TRIP, upon closer check, we noted that the busstop 25059 has the point coordinate that sits between hexagon 4 and hexagon 128.

par(mfrow = c(1, 2))
plot(hexagon_grid_sf[[1]][[4]])
points(3970.122, 28063.28)
plot(hexagon_grid_sf[[1]][[128]])
points(3970.122, 28063.28)
We will keep the data sitting on the line of hexagon as it is, for now.
Next, we combine our obs_peak data frame onto the hexagon_grid_sf data frame to understand the total number of trips per origin bus stop during peak period in hexagon.
hexagon_obs_peak <- left_join(hexagon_grid_sf, obs_peak,
by = "grid_id") %>%
drop_na("TOTAL_TRIPS","grid_id") #remove the entries with NA in TRIPS and grid_idtmap_mode("view")
tmap_options(check.and.fix = TRUE)
map_honeycomb = tm_shape(hexagon_obs_peak) +
tm_fill(
col = "TOTAL_TRIPS",
palette = "Reds",
style = "cont",
title = "Number of Trips",
alpha = 0.6,
popup.vars = c("Number of TRIPS: " = "TOTAL_TRIPS"),
popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
tm_borders(col = "grey40", lwd = 0.7)+
tm_scale_bar()
map_honeycomb_q = tm_shape(hexagon_obs_peak)+
tm_fill("TOTAL_TRIPS",
style = "quantile",
palette = "Reds",
title = "Number of Trips (Quantile)") +
tm_layout(main.title = "Passenger Trips during Peak Hours",
main.title.position = "center",
main.title.size = 1.2,
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE) +
tm_borders(alpha = 0.3) +
tm_scale_bar() +
tm_grid(alpha =0.2)
tmap_arrange(map_honeycomb, map_honeycomb_q, asp=1, ncol=2)Next we take a closer look at those hexagon area that have >100k total number of trips:
tmap_mode("view")
tmap_options(check.and.fix = TRUE)
map_honeycomb = tm_shape(hexagon_obs_peak %>%
filter(TOTAL_TRIPS>100000)) +
tm_fill(
col = "TOTAL_TRIPS",
palette = "Reds",
style = "cont",
title = "Number of TRIPS",
id = "grid_id",
alpha = 0.6,
popup.vars = c("Number of TRIPS: " = "TOTAL_TRIPS"),
popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
tm_borders(col = "grey40", lwd = 0.7)
map_honeycombOwn Analysis
The above map shows that the areas with >100k total trips during the peak hours are maintain saturated along key residential area, or likely in areas where there is direct bus to cities / CBD (for those commuting to school / work during peak hours).
It is also worth noting that the areas near SG-MY causeway also have high number of total trips during peak period, likely due to people residing in Malaysia heading to Singapore for work / study, likely contributed by weekdays peak period.
Local Indicators of Spatial Association (LISA) Analysis
Objective is to:
Compute LISA of the passengers trips generate by origin at hexagon level.
Display the LISA maps of the passengers trips generate by origin at hexagon level. The maps should only display the significant (i.e. p-value < 0.05)
With reference to the analysis results, draw statistical conclusions (not more than 200 words per visual).
Data Preparation
hexagon1 <- read_rds("data/rds/hexagon.rds")
hexagon1 <- left_join(hexagon1, hexagon_grid_sf,
by = "grid_id")
#this is to obtain a combined dataframe with bus stop N, grid id and hexagon geometry#for simplicity, i will use the total trips taken during the peak period (for all timing, during both weekday and weekends)
hexagon1_obs_peak <- left_join(hexagon1, odbs_peak,
by = c("BUS_STOP_N" = "ORIGIN_PT_CODE")) %>%
rename(ORIGIN_BS = BUS_STOP_N) %>%
group_by(grid_id) %>%
summarise(TOTAL_TRIPS = sum(TRIPS), ORIGIN_BS = list(ORIGIN_BS))
#just retaining both the total trips and the list of bus stop covered by each hexagon
hexagon1_obs_peak <- left_join(hexagon1_obs_peak, hexagon_grid_sf, by = "grid_id")Global Measures of Spatial Association
Step 1: Deriving contiguity spatial weights
nb_queen <- hexagon1_obs_peak %>%
mutate(nb = st_contiguity(geometry),
.before = 1)
summary(nb_queen$nb)Neighbour list object:
Number of regions: 3131
Number of nonzero links: 9362
Percentage nonzero weights: 0.09549981
Average number of links: 2.990099
112 regions with no links:
4 7 9 11 12 17 28 29 36 40 43 46 47 55 62 72 99 158 177 189 205 217 244
245 255 324 346 356 408 421 471 503 505 506 509 510 517 524 575 598 685
722 738 749 750 761 783 835 874 899 922 1001 1024 1090 1093 1113 1126
1266 1287 1296 1346 1356 1378 1387 1426 1427 1428 1432 1454 1467 1483
1500 1501 1503 1520 1551 1865 1926 1940 1971 2015 2048 2049 2070 2075
2085 2099 2180 2214 2451 2646 2667 2691 2727 2728 2745 2752 2759 2891
2926 2958 3035 3047 3056 3068 3069 3079 3086 3105 3109 3120 3131
237 disjoint connected subgraphs
Link number distribution:
0 1 2 3 4 5 6
112 362 712 781 694 363 107
362 least connected regions:
1 2 3 5 6 10 15 23 24 27 32 33 35 37 38 42 44 48 53 74 75 77 79 85 98 108 109 112 117 118 135 137 140 143 144 145 148 151 153 155 156 160 162 181 190 197 198 204 209 212 213 224 228 229 237 254 259 264 265 271 272 296 310 318 334 335 336 343 354 368 369 377 381 385 389 390 407 416 418 436 470 494 495 499 500 501 502 518 519 525 535 540 542 565 573 591 592 606 610 611 625 628 634 635 639 643 658 667 682 698 708 736 746 747 751 758 763 767 772 794 812 813 814 819 859 862 875 881 890 893 900 906 907 938 939 944 953 962 978 980 983 984 998 1012 1016 1029 1044 1053 1060 1062 1067 1068 1077 1078 1083 1094 1106 1125 1136 1160 1166 1167 1170 1178 1189 1204 1205 1212 1216 1220 1221 1222 1223 1229 1236 1241 1242 1265 1271 1282 1288 1294 1303 1305 1310 1315 1317 1318 1325 1329 1344 1349 1365 1367 1368 1372 1385 1389 1390 1402 1403 1419 1423 1441 1449 1451 1452 1455 1458 1463 1480 1497 1499 1510 1550 1552 1553 1555 1565 1572 1605 1607 1608 1629 1656 1664 1665 1670 1687 1701 1716 1718 1734 1740 1761 1780 1781 1821 1844 1864 1877 1907 1912 1915 1943 1948 1957 1970 1988 2000 2005 2012 2014 2016 2024 2030 2031 2033 2043 2059 2069 2076 2097 2100 2102 2113 2114 2123 2124 2135 2136 2151 2178 2179 2181 2182 2187 2200 2201 2217 2218 2224 2227 2229 2240 2249 2260 2261 2275 2276 2296 2313 2330 2349 2386 2398 2439 2475 2498 2499 2511 2522 2530 2565 2571 2572 2590 2595 2635 2636 2638 2645 2655 2663 2664 2681 2697 2709 2712 2718 2721 2778 2781 2782 2816 2817 2828 2837 2858 2865 2870 2902 2906 2907 2943 2968 3005 3017 3026 3041 3045 3048 3049 3051 3052 3062 3075 3084 3090 3093 3095 3098 3099 3103 3107 3113 3114 3118 3119 3122 3127 3130 with 1 link
107 most connected regions:
171 219 258 302 323 472 529 549 555 557 584 599 717 816 836 850 867 888 940 974 976 993 995 1006 1013 1162 1180 1300 1312 1322 1323 1332 1333 1342 1351 1405 1417 1506 1521 1549 1641 1659 1663 1688 1727 1736 1748 1765 1804 1819 1839 1859 1861 1892 1896 1909 1923 1930 1944 1949 1959 1974 1991 2007 2020 2021 2051 2066 2116 2118 2206 2213 2219 2272 2278 2286 2298 2302 2362 2372 2373 2383 2389 2390 2409 2411 2442 2504 2534 2577 2783 2787 2801 2810 2822 2831 2832 2841 2850 2862 2919 2970 2971 2972 2978 2988 2995 with 6 links
The summary report above shows that there are 3131 hexagons. The most connected hexagons has 6 neighbours. There are 362 hexagons with only one neighbour.
nb_queen# A tibble: 3,131 × 5
nb grid_id TOTAL_TRIPS ORIGIN_BS geometry
<nb> <int> <dbl> <list> <POLYGON [m]>
1 <int [1]> 4 549 <chr [1]> ((3845.122 27853.31, 3720.122 27925.…
2 <int [1]> 128 549 <chr [1]> ((4095.122 27853.31, 3970.122 27925.…
3 <int [1]> 316 210 <chr [1]> ((4470.122 28502.83, 4345.122 28575,…
4 <int [1]> 321 374 <chr [1]> ((4470.122 30667.89, 4345.122 30740.…
5 <int [1]> 440 2416 <chr [1]> ((4720.122 28502.83, 4595.122 28575,…
6 <int [1]> 444 780 <chr [1]> ((4720.122 30234.88, 4595.122 30307.…
7 <int [1]> 446 403 <chr [1]> ((4720.122 31100.9, 4595.122 31173.0…
8 <int [2]> 505 5054 <chr [2]> ((4845.122 30018.37, 4720.122 30090.…
9 <int [1]> 509 499 <chr [1]> ((4845.122 31750.42, 4720.122 31822.…
10 <int [1]> 569 4075 <chr [1]> ((4970.122 30667.89, 4845.122 30740.…
# ℹ 3,121 more rows
Deriving fixed distance weight
hexagon1_obs_peak# A tibble: 3,131 × 4
grid_id TOTAL_TRIPS ORIGIN_BS geometry
<int> <dbl> <list> <POLYGON [m]>
1 4 549 <chr [1]> ((3845.122 27853.31, 3720.122 27925.48, 3720.1…
2 128 549 <chr [1]> ((4095.122 27853.31, 3970.122 27925.48, 3970.1…
3 316 210 <chr [1]> ((4470.122 28502.83, 4345.122 28575, 4345.122 …
4 321 374 <chr [1]> ((4470.122 30667.89, 4345.122 30740.06, 4345.1…
5 440 2416 <chr [1]> ((4720.122 28502.83, 4595.122 28575, 4595.122 …
6 444 780 <chr [1]> ((4720.122 30234.88, 4595.122 30307.05, 4595.1…
7 446 403 <chr [1]> ((4720.122 31100.9, 4595.122 31173.07, 4595.12…
8 505 5054 <chr [2]> ((4845.122 30018.37, 4720.122 30090.54, 4720.1…
9 509 499 <chr [1]> ((4845.122 31750.42, 4720.122 31822.59, 4720.1…
10 569 4075 <chr [1]> ((4970.122 30667.89, 4845.122 30740.06, 4845.1…
# ℹ 3,121 more rows
hexagon1_obs_peak <- hexagon1_obs_peak %>%
drop_na(TOTAL_TRIPS)
hexagon1_peak_sf <- st_as_sf(hexagon1_obs_peak)
geo <- st_geometry(hexagon1_peak_sf)
nb <- st_knn(geo, longlat = TRUE)
dists <- unlist(st_nb_dists(geo, nb))
summary(dists) Min. 1st Qu. Median Mean 3rd Qu. Max.
250.0 250.0 250.0 260.4 250.0 4520.8
longitude <- map_dbl(hexagon1_obs_peak$geometry, ~st_centroid(.x)[[1]])
latitude <- map_dbl(hexagon1_obs_peak$geometry, ~st_centroid(.x)[[2]])
coords <- cbind(longitude, latitude)
#hexagon1_obs_peak$coords <- coords
#if we need to use centroidsk1 <- knn2nb(knearneigh(coords, k = 1))
k1dists <- unlist(nbdists(k1, coords))
summary(k1dists) Min. 1st Qu. Median Mean 3rd Qu. Max.
250.0 250.0 250.0 260.4 250.0 4520.8
hexagon1_obs_peak$ORIGIN_BS[match(max(k1dists), k1dists)][[1]]
[1] "46239"
wm_d43 <- dnearneigh(coords,0,43000)
wm_d43Neighbour list object:
Number of regions: 3025
Number of nonzero links: 9147338
Percentage nonzero weights: 99.96408
Average number of links: 3023.913
From the output, we see that the average number of links is 3023.913. The number is quite high and may skew the analysis.
str(wm_d43)List of 3025
$ : int [1:2998] 2 3 4 5 6 7 8 9 10 11 ...
$ : int [1:3001] 1 3 4 5 6 7 8 9 10 11 ...
$ : int [1:3006] 1 2 4 5 6 7 8 9 10 11 ...
$ : int [1:3013] 1 2 3 5 6 7 8 9 10 11 ...
$ : int [1:3008] 1 2 3 4 6 7 8 9 10 11 ...
$ : int [1:3015] 1 2 3 4 5 7 8 9 10 11 ...
$ : int [1:3021] 1 2 3 4 5 6 8 9 10 11 ...
$ : int [1:3017] 1 2 3 4 5 6 7 9 10 11 ...
$ : int [1:3023] 1 2 3 4 5 6 7 8 10 11 ...
$ : int [1:3023] 1 2 3 4 5 6 7 8 9 11 ...
$ : int [1:3017] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3022] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3023] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3023] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3023] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3023] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3023] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3023] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3023] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:3024] 1 2 3 4 5 6 7 8 9 10 ...
[list output truncated]
- attr(*, "class")= chr "nb"
- attr(*, "region.id")= chr [1:3025] "1" "2" "3" "4" ...
- attr(*, "call")= language dnearneigh(x = coords, d1 = 0, d2 = 43000)
- attr(*, "dnn")= num [1:2] 0 43000
- attr(*, "bounds")= chr [1:2] "GE" "LE"
- attr(*, "nbtype")= chr "distance"
- attr(*, "sym")= logi TRUE
Adaptive Distance-based Weight Matrix
To overcome the issue of fixed distance weight matrix where there is uneven distribution of neighbours, we can use directly control the numbers of neighbours using k-nearest neighbours, as shown in the code chunk below.
As a rule-of-thumb, we will set k = 8 i.e., all regions will have 8 neighbours.
knn8 <- knn2nb(knearneigh(coords, k=8))
knn8Neighbour list object:
Number of regions: 3025
Number of nonzero links: 24200
Percentage nonzero weights: 0.2644628
Average number of links: 8
Non-symmetric neighbours list
Which spatial weight matrix to use?
Between contiguity-based and distance-based spatial weight matrices, we lean towards distance-based matrices. Within distance-based matrices, we will select the adaptive distance-based spatial weight matrix for our subsequent analysis, due to:
Singapore bus stop are not evenly distributed across the country, some area have more bus stops, more trips. Hence, a contiguity-based matrix will have the issue where hexagon area of certain area may have more neighbours. This would likely skew our analysis. Therefore, distance-based methods are preferred.
The fixed distance-based method (with high upper limit) has the disadvantage that some regions would only have zero or 1 neighbour, while on average regions have 3023 neighbours (over 3025 records).
Based on the above, we will select adaptive distance-based spatial weight matrix.
Row-Standardised Weights Matrix
After selecting the weight matrix to use, we will now assign weights to each neighboring polygon. Each neighboring polygon will be assigned equal weight (style=“W”) by assigning the fraction 1/(#of neighbors) to each neighbouring area. This is also known as a row-standardised matrix where each row in the matrix sums to 1.
rswm_knn8 <- nb2listw(knn8,
style = "W",
zero.policy = TRUE)
rswm_knn8Characteristics of weights list object:
Neighbour list object:
Number of regions: 3025
Number of nonzero links: 24200
Percentage nonzero weights: 0.2644628
Average number of links: 8
Non-symmetric neighbours list
Weights style: W
Weights constants summary:
n nn S0 S1 S2
W 3025 9150625 3025 696.9062 12303.88
We will be using the row-standardised weight matrix for the next part of the analysis.
The Moran I statistic ranges from -1 to 1. If the Moran I is:
positive (I>0): Clustered, observations tend to be similar
negative (I<0): Disperse, observations tend to be dissimilar
approximately zero: observations arranged randomly over space
moran.test(hexagon1_obs_peak$TOTAL_TRIPS,
listw = rswm_knn8,
zero.policy = TRUE,
na.action = na.omit)
Moran I test under randomisation
data: hexagon1_obs_peak$TOTAL_TRIPS
weights: rswm_knn8
Moran I statistic standard deviate = 9.2483, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
7.730524e-02 -3.306878e-04 7.047012e-05
Since the p-value < 0.05, we have sufficient statistical evidence to reject the null hypothesis at the 95% level of confidence. This means that data is more spatially clustered than expected by chance alone. Since Moran I statistics are larger than 0, the observation are clustered, observations tend to be similar.
Computing Monte Carlo Moran’s I
set.seed(1234)
bperm_func = moran.mc(hexagon1_obs_peak$TOTAL_TRIPS,
listw = rswm_knn8,
nsim = 999,
zero.policy = TRUE,
na.action = na.omit)
bperm_func
Monte-Carlo simulation of Moran I
data: hexagon1_obs_peak$TOTAL_TRIPS
weights: rswm_knn8
number of simulations + 1: 1000
statistic = 0.077305, observed rank = 1000, p-value = 0.001
alternative hypothesis: greater
Since the pseudo p-value is < 0.05, we have sufficient statistical evidence to reject the null hypothesis at the 95% level of confidence. This means that data is more spatially clustered than expected by chance alone.
Computing local Moran’s I
hexagon1_peak_sf <- hexagon1_peak_sf %>%
drop_na(TOTAL_TRIPS)
localMI_func <- localmoran(hexagon1_peak_sf$TOTAL_TRIPS, rswm_knn8)
head(localMI_func) Ii E.Ii Var.Ii Z.Ii Pr(z != E(Ii))
1 0.1910008 -6.973375e-05 0.02630518 1.178075 0.2387665
2 0.1910008 -6.973375e-05 0.02630518 1.178075 0.2387665
3 0.1994463 -7.202790e-05 0.02717053 1.210413 0.2261203
4 0.1930292 -7.091341e-05 0.02675014 1.180645 0.2377436
5 0.1759933 -5.776431e-05 0.02179030 1.192634 0.2330128
6 0.1982239 -6.819174e-05 0.02572354 1.236346 0.2163300
localmoran() function returns a matrix of values whose columns are:
Ii: the local Moran’s I statistics
E.Ii: the expectation of local moran statistic under the randomisation hypothesis
Var.Ii: the variance of local moran statistic under the randomisation hypothesis
Z.Ii:the standard deviate of local moran statistic
Pr(): the p-value of local moran statistic
Mapping Local Moran’s I values and p-values
hpobs.localMI_func <- cbind(hexagon1_peak_sf,localMI_func) %>%
rename(Pr.Ii = Pr.z....E.Ii..)Visualizing local Moran’s I and p-value
localMI_func.map <- tm_shape(hpobs.localMI_func) +
tm_fill(col = "Ii",
style = "pretty",
title = "Local Moran I Statistics") +
tm_borders(alpha = 0.3) +
tm_layout(main.title = "Local Moran's I Map)",
main.title.size = 1,
main.title.position = "center",
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE)
pvalue_func.map <- tm_shape(hpobs.localMI_func) +
tm_fill(col = "Pr.Ii",
breaks = c(-Inf, 0.001, 0.01, 0.05, 0.1, Inf),
palette = "-Blues",
title = "Local Moran's I p-values") +
tm_borders(alpha = 0.3)+
tm_layout(main.title = "Local Moran's I p-values Map)",
main.title.size = 1,
main.title.position = "center",
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE)
tmap_arrange(localMI_func.map, pvalue_func.map, asp = 1, ncol = 2)Visualising LISA map
LISA map is a categorical map showing outliers and clusters. There are two types of outliers namely: High-Low and Low-High outliers. Likewise, there are two type of clusters namely: High-High and Low-Low cluaters. In fact, LISA map is an interpreted map by combining local Moran’s I of geographical areas and their respective p-values.
Data Preparation
quadrant <- vector(mode = 'numeric', length = nrow(localMI_func))
hexagon1_peak_sf$lag_pct_func <- lag.listw(rswm_knn8, hexagon1_peak_sf$TOTAL_TRIPS)
DV_func <- hexagon1_peak_sf$lag_pct_func - mean(hexagon1_peak_sf$lag_pct_func)
LM_I_func <- localMI_func[,1]
signif <- 0.05
quadrant[DV_func <0 & LM_I_func>0] <- 1 #low-low
quadrant[DV_func >0 & LM_I_func<0] <- 2 #high-low
quadrant[DV_func <0 & LM_I_func<0] <- 3 #low-high
quadrant[DV_func >0 & LM_I_func>0] <- 4 #high-high
quadrant[localMI_func[,5]>signif] <- 0To prepare the LISA cluster map.
#Assign each region to its respective
hpobs.localMI_func$quadrant <- quadrant
#Set the colours--one for each quadrant
colors <- c("#ffffff", "#2c7bb6", "#abd9e9", "#fdae61", "#d7191c")
clusters <- c("insignificant", "low-low", "low-high", "high-low", "high-high")
tm_shape(hpobs.localMI_func) +
tm_fill(col = "TOTAL_TRIPS",
style = "quantile",
palette = colors[c(sort(unique(quadrant)))+1],
labels = clusters[c(sort(unique(quadrant)))+1]) +
tm_borders(alpha = 0.3) +
tm_layout(main.title = "LISA Cluster Map) ",
main.title.size = 1,
main.title.position = "center",
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE)Interpretation of Results
At a glance, in the LISA map, we can see the larger clusters situation near the top, left and right of map. These areas are the key residential areas where many commuter may be taking bus to school / work during the peak hours. . We also notice no significant spatial clusters in the middle/bottom area of the map, these are less residential area and are mostly town area where people are heading into (destination), instead of leaving from (origin), during the peak hours..
Additional
Note: This is purely for my own curiosity.
mpsz <- st_read(dsn="data/geospatial",layer="MPSZ-2019") %>%
st_transform(crs = 3414)Reading layer `MPSZ-2019' from data source
`C:\cftoh\ISSS624\Take-home_Ex\Take-home_Ex1\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS: WGS 84
busstop_mpsz <- st_intersection(busstop, mpsz) %>%
select(BUS_STOP_N, SUBZONE_C) %>%
st_drop_geometry()obs_peak <- left_join(odbs_peak, busstop_mpsz,
by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
rename(ORIGIN_BS = ORIGIN_PT_CODE,
ORIGIN_SZ = SUBZONE_C)
obs_peak <- unique(obs_peak)mpsz_obs_peak <- left_join(mpsz,
obs_peak,
by = c("SUBZONE_C" = "ORIGIN_SZ"))Preparing a choropleth map showing the distribution of passenger trips at planning sub-zone level:
mpsz_obs_peak %>%
drop_na(TRIPS)Simple feature collection with 5013 features and 8 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2667.538 ymin: 21448.47 xmax: 55941.94 ymax: 50256.33
Projected CRS: SVY21 / Singapore TM
First 10 features:
SUBZONE_N SUBZONE_C PLN_AREA_N PLN_AREA_C REGION_N
1 INSTITUTION HILL RVSZ05 RIVER VALLEY RV CENTRAL REGION
2 INSTITUTION HILL RVSZ05 RIVER VALLEY RV CENTRAL REGION
3 ROBERTSON QUAY SRSZ01 SINGAPORE RIVER SR CENTRAL REGION
4 ROBERTSON QUAY SRSZ01 SINGAPORE RIVER SR CENTRAL REGION
5 ROBERTSON QUAY SRSZ01 SINGAPORE RIVER SR CENTRAL REGION
6 ROBERTSON QUAY SRSZ01 SINGAPORE RIVER SR CENTRAL REGION
7 ROBERTSON QUAY SRSZ01 SINGAPORE RIVER SR CENTRAL REGION
8 ROBERTSON QUAY SRSZ01 SINGAPORE RIVER SR CENTRAL REGION
9 ROBERTSON QUAY SRSZ01 SINGAPORE RIVER SR CENTRAL REGION
10 ROBERTSON QUAY SRSZ01 SINGAPORE RIVER SR CENTRAL REGION
REGION_C ORIGIN_BS TRIPS geometry
1 CR 13089 7899 MULTIPOLYGON (((28481.45 30...
2 CR 13099 15531 MULTIPOLYGON (((28481.45 30...
3 CR 04321 10466 MULTIPOLYGON (((28087.34 30...
4 CR 06129 7568 MULTIPOLYGON (((28087.34 30...
5 CR 06151 6104 MULTIPOLYGON (((28087.34 30...
6 CR 06159 11722 MULTIPOLYGON (((28087.34 30...
7 CR 06169 14622 MULTIPOLYGON (((28087.34 30...
8 CR 13079 5238 MULTIPOLYGON (((28087.34 30...
9 CR 13109 16274 MULTIPOLYGON (((28087.34 30...
10 CR 13119 3949 MULTIPOLYGON (((28087.34 30...
tm_shape(mpsz_obs_peak)+
tm_fill("TRIPS",
style = "quantile",
palette = "Reds",
title = "Passenger trips") +
tm_layout(main.title = "Passenger Trips during Peak Hours",
main.title.position = "center",
main.title.size = 1.2,
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE) +
tm_borders(alpha = 0.3) +
tm_compass(type="8star", size = 2) +
tm_scale_bar() +
tm_grid(alpha =0.2) +
tm_credits("Source: Planning Sub-zone boundary from Urban Redevelopment Authorithy (URA)\n and Population data from Department of Statistics DOS",
position = c("left", "bottom"))